Agent System API
This document describes the Agent System API that powers reactive AI agents, browser automation commands, and integrated execution workflows. It covers endpoint definitions, request/response schemas, authentication requirements, and practical usage patterns for AI-driven automation. It also documents agent-specific request formatting, response handling, error recovery, and client integration examples for browser extensions and external clients.
The API is implemented as a FastAPI application that mounts multiple routers under standardized prefixes. The routers delegate to service classes that orchestrate agent workflows and tool integrations.
Registers routers under /api/*"] end subgraph "Routers" R1["/api/genai/react
react_agent.py"] R2["/api/agent/generate-script
browser_use.py"] R3["/api/genai/health
health.py"] end subgraph "Services" S1["services/react_agent_service.py"] S2["services/browser_use_service.py"] end subgraph "Models" M1["models/requests/*.py"] M2["models/response/*.py"] end A --> R1 A --> R2 A --> R3 R1 --> S1 R2 --> S2 R1 --> M1 R2 --> M1 S1 --> M2 S2 --> M2
Diagram sources
Section sources
Reactive Agent Endpoint: Processes natural language queries with optional chat history, Google access tokens, PyJIIT login payloads, client HTML context, and optional file attachments. Returns a plain text answer.
Browser Automation Script Generator: Accepts a goal, optional target URL, DOM structure, and constraints. Returns a validated JSON action plan or structured errors.
Health Endpoint: Lightweight health check returning a simple status object.
Section sources
The system follows a layered architecture:
API Layer: FastAPI routers expose endpoints and handle request validation.
Service Layer: Business logic orchestrates agent workflows and tool integrations.
Agent Layer: LangGraph-based reactive agent with tool invocation.
Tools Layer: Structured tools for web search, websites, GitHub, YouTube, Gmail, Calendar, PyJIIT, and browser actions.
Client Layer: Extension and external clients send requests and receive responses.
Diagram sources
Reactive Agent Endpoint#
Method: POST
URL: /api/genai/react
Purpose: Answer natural language questions with optional chat history, Google access tokens, PyJIIT session, client HTML context, and optional file attachments.
Authentication: Not enforced at the API level; however, optional tokens enable richer tool usage.
Request Schema: models/requests/crawller.py
question: Required string
chat_history: Optional list of {role, content}
google_access_token: Optional string
pyjiit_login_response: Optional PyJIIT login payload
client_html: Optional raw HTML from the active browser tab
attached_file_path: Optional absolute path to a file to process via Google GenAI SDK
Response Schema: models/response/crawller.py
answer: Plain text string
Behavior:
Validates presence of question.
Optionally attaches a file via Google GenAI SDK and returns model-generated text.
Builds a LangGraph state with system, human, and optional page-context messages.
Executes the reactive agent graph and returns the final assistant message content.
Error Handling:
Raises HTTP 400 for missing question.
Raises HTTP 500 for unhandled exceptions during processing.
Example Usage:
Client composes a request payload with question, optional chat_history, and optional tokens.
Client sends POST to /api/genai/react.
Server responds with answer.
Diagram sources
Section sources
Browser Automation Script Generator#
Method: POST
URL: /api/agent/generate-script
Purpose: Generate a JSON action plan for automating browser tasks based on a goal, optional target URL, DOM structure, and constraints.
Authentication: Not enforced at the API level.
Request Schema: models/requests/agent.py
goal: Required string
target_url: Optional string
dom_structure: Optional dict with keys: url, title, interactive[]
constraints: Optional dict
Response Schema: models/response/agent.py
ok: Boolean
action_plan: Optional dict
error: Optional string
problems: Optional list of validation problem strings
raw_response: Optional raw LLM output snippet
Behavior:
Formats DOM info and constructs a prompt for the LLM.
Invokes the LLM to produce a JSON action plan.
Sanitizes and validates the JSON action plan.
Returns either ok=true with action_plan or ok=false with error/problems/raw_response.
Error Handling:
Returns structured error fields when validation fails.
Returns HTTP 500 for unexpected exceptions.
Diagram sources
Section sources
Health Endpoint#
Method: GET
URL: /api/genai/health
Purpose: Verify service availability.
Authentication: Not enforced.
Response Schema: models/response/health.py
status: String
message: String
Section sources
Agent Execution Workflow (Extension Client)#
The extension composes requests for various agents and executes them. It captures active tab HTML, resolves URLs, and builds payloads tailored to each endpoint.
Diagram sources
Section sources
The API depends on routers, services, and models. The reactive agent integrates with LangGraph and a set of structured tools.
Diagram sources
Section sources
Token Limits: The script generator limits interactive DOM elements to reduce prompt size and avoid excessive tokens.
Async I/O: Services use async LLM invocation and thread pools for tool operations to prevent blocking.
Caching: The reactive agent graph is cached to avoid repeated compilation overhead.
Validation Early Exit: Script generation validates and sanitizes JSON early to fail fast on malformed plans.
[No sources needed since this section provides general guidance]
HTTP 400 Bad Request
Cause: Missing required field (e.g., question or goal).
Resolution: Ensure the payload includes the required fields.
HTTP 500 Internal Server Error
Cause: Unexpected exception in service or agent execution.
Resolution: Inspect server logs; the service returns a generic error message to the client.
Validation Failures for Script Generation
Cause: Generated JSON action plan fails validation.
Resolution: Review problems list in the response and adjust goal/target URL/DOM structure.
Missing Tokens for Tool Access
Cause: Tools requiring Google access tokens or PyJIIT sessions are not usable without proper context.
Resolution: Provide google_access_token or pyjiit_login_response in the request.
Section sources
The Agent System API provides two primary capabilities: answering natural language queries with a reactive agent and generating browser automation scripts from goals and DOM context. The design emphasizes structured request/response schemas, robust validation, and extensible tooling. Clients can integrate via direct HTTP calls or through the extension’s command executor.
[No sources needed since this section summarizes without analyzing specific files]
Endpoint Reference#
POST /api/genai/react
Request: models/requests/crawller.py
Response: models/response/crawller.py
Notes: Supports optional Google access token, PyJIIT session, client HTML, and file attachment.
POST /api/agent/generate-script
Request: models/requests/agent.py
Response: models/response/agent.py
Notes: Returns ok=true with action_plan or ok=false with error and problems.
GET /api/genai/health
Response: models/response/health.py
Agent Message Payload Model#
Request: models/requests/react_agent.py
Response: models/response/react_agent.py
PyJIIT Login Payload Model#
Request: models/requests/pyjiit.py
Client Integration Patterns#
Extension Client: See extension/entrypoints/utils/executeAgent.ts for command parsing, tab context capture, and endpoint routing.
Section sources